Start Small, Build Complete: Effective and Efficient Semantic Table Interpretation using TableMiner
نویسنده
چکیده
This article introduces TableMiner, the first semantic Table Interpretation method able to annotate Web tables using an incremental, bootstrapping learning approach seeded by automatically selected ‘partial’ content from tables. The basic principle is to create initial and partial annotations of a table using some as opposed to all content in the table. The partial outcome then serves as ‘stepping stones’ to guide interpretation of remaining content in the table, followed by a process of iteratively refining results on the entire table to create final optimal annotations. To construct feature representations, TableMiner uses various types of contextual information both inside and outside tables, including pre-defined semantic markups (e.g., RDFa/Microdata annotations) within some webpages that to the best of the author’s knowledge, have never been used in Natural Language Processing tasks. Evaluated on the largest collection of datasets known to-date including four datasets with a total of more than 15,000 tables, TableMiner consistently outperforms four baselines under all experimental settings. On the twomost representative datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific tasks. Compared against state-of-the-art, it also obtains higher results on similar datasets. The bootstrapping learning strategy seeded by partial table content also enables TableMiner to be very efficient. Empirically, it reduces data to be processed by up to 66% or 29% savings in CPU time when compared against classic methods that ‘exhaustively’ processes the entire table content to build features for interpretation.
منابع مشابه
Effective and efficient Semantic Table Interpretation using TableMiner+
This article introduces TableMiner, a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for ...
متن کاملVisualizing Semantic Table Annotations with TableMiner+
This paper describes an extension of the TableMiner system, an open source Semantic Table Interpretation system that annotates Web tables using Linked Data in an effective and efficient approach. It adds a graphical user interface to TableMiner, to facilitate the visualization and correction of automatically generated annotations. This makes TableMiner an ideal tool for the semi-automatic creat...
متن کاملA Tool for Creating and Visualizing Semantic Annotations on Relational Tables
Semantically annotating content from relational tables on the Web is a crucial task towards realizing the vision of the Semantic Web. However, there is a lack of open source, user-friendly tools to facilitate this. This paper describes an extension of the TableMiner system, an open source Semantic Table Interpretation system that automatically annotates Web tables using Linked Data in an effect...
متن کاملTowards Odalic, a Semantic Table Interpretation Tool in the ADEQUATe Project
The goal of the ADEQUATe project is to assess and improve quality of the (tabular) open data being published at two Austrian open data portals – https://www.data.gv.at and https://www.opendataportal.at. The goal of the quality improvement technique described in this paper is to semantically interpret such tabular data and publish them as Linked Data; this basically means to (1) classify columns...
متن کاملInterpretation of In-air Output Ratio of Wedged Fields in Different Measurement Conditions
Introduction: The head scatter factor (Sc) is one of the important parameters for monitor unit (MU) calculation. There are multiple factors that impact the Sc values, such as, head structures, back scattering in to dose monitoring chambers, wedges and so on. This study aimed to investigate the variations of SC with different build-up cap materials, wall thickness, Source to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014